Parallel matrix transpose algorithms on distributed memory concurrent computers

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Matrix Transpose Algorithms on Distributed Memory Concurrent Computers

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P Q processor template with a block scattered data distribution. P , Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD...

متن کامل

Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers

This paper describes the Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PIJhlMA package includes not only the non-transposed matrix multiplication routine C = A . B. but also transposed multiplication routines C = AT . B, C = A . BT, and C = AT . BT, for a block scattered data distribution. The routines perform efficiently for a wide ...

متن کامل

Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers

Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N ), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using N = logN processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DM...

متن کامل

A new parallel matrix multiplication algorithm on distributed-memory concurrent computers

We present a new fast and scalable matrix multiplication algorithm, called DIMMA (Distribution-Independent Matrix Multiplication Algorithm), for block cyclic data distribution on distributed-memory concurrent computers. The algorithm is based on two new ideas; it uses a modi ed pipelined communication scheme to overlap computation and communication e ectively, and exploits the LCM block concept...

متن کامل

Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers

Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Parallel Computing

سال: 1995

ISSN: 0167-8191

DOI: 10.1016/0167-8191(95)00016-h